Correlation based speech-video synchronization
نویسندگان
چکیده
This paper presents a novel Lip synchronization technique which investigates the correlation between the speech and lips movements. First, the speech signal is represented as a nonlinear time-varying model which involves a sum of AM–FM signals. Each of these signals is employed to model a single Formant frequency. The model is realized using Taylor series expansion in a way which provides the relationship between the lip shape (width and height) w.r.t. the speech amplitude and instantaneous frequency. Using lips width and height, a semi-speech signal is generated and correlated with the original speech signal over a span of delays then the delay between the speech and the video is estimated. Using real and noisy data from the VidTimit and in-house diastases, the proposed method was able to estimate small delays of 0.01–0.1 s in the case of noise-less and noisy signals respectively with a maximum absolute error of 0.0022 s. 2011 Elsevier B.V. All rights reserved.
منابع مشابه
Speech-Video Synchronization Using Lips Movements and Speech Envelope Correlation
In this paper, we propose a novel correlation based method for speech-video synchronization (synch) and relationship classification. The method uses the envelope of the speech signal and data extracted from the lips movement. Firstly, a nonlinear-time-varying model is considered to represent the speech signal as a sum of amplitude and frequency modulated (AM-FM) signals. Each AM-FM signal, in t...
متن کاملA Survey – Audio and Video Synchronization
The audio and video Synchronization is extremely necessary. The synchronization loss between image and sound continues to disturb observers and irritate telecasters. The demand is to assure synchronization without adjusting content at the same time as still retaining price low. The objective of the synchronization is to line up both the audio and video signals that are processed individually. T...
متن کاملOn Using Digital Speech Processing Techniques for Synchronization in Heterogeneous Teleconferencing
As the popularity of multi-functional communication devices grows, traditional audio conferencing now may involve heterogeneous teleconferencing devices, including POTS phone, VoIP phones, dualmode smart phones, and so on. During a multi-party audio conference involving heterogeneous devices, it is possible that a video conference is held concurrently involving a subset of devices capable of pr...
متن کاملConCor+: Robust and confident video synchronization using consensus-based Cross-correlation
Consensus-based Cross-correlation (ConCor) is a recently presented algorithm for robust synchronization of noisy and corrupted signals. ConCor has a number of interdependent parameters that need to be set correctly to guarantee good performance. In this paper we analyse the effects of the individual parameters on ConCor’s behaviour and performance. As a second contribution, we show that a param...
متن کاملText Driven Face-Video Synthesis Using GMM and Spatial Correlation
Liveness detection is increasingly planned to be incorporated into biometric systems to reduce the risk of spoofing and impersonation. Some of the techniques used include detection of motion of the head while posing/speaking, iris size in varying illumination, fingerprint sweat, textprompted speech, speech-to-lip motion synchronization etc. In this paper, we propose to build a biometric signal ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 32 شماره
صفحات -
تاریخ انتشار 2011